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Abstract. Suppose we are given a matrix that is formed by adding an unknown sparse matrix to 
an unknown low-rank matrix. Our goal is to decompose the given matrix into its sparse and low-rank 
components. Such a problem arises in a number of applications in model and system identification, 
and is NP-hard in general. In this paper we consider a convex optimization formulation to splitting 
the specified matrix into its components, by minimizing a linear combination of the £i norm and the 
nuclear norm of the components. We develop a notion of rank-sparsity incoherence, expressed as an 
uncertainty principle between the sparsity pattern of a matrix and its row and column spaces, and 
use it to characterize both fundamental identifiability as well as (deterministic) sufficient conditions 
for exact recovery. Our analysis is geometric in nature, with the tangent spaces to the algebraic 
varieties of sparse and low-rank matrices playing a prominent role. When the sparse and low-rank 
matrices are drawn from certain natural random ensembles, we show that the sufficient conditions for 
exact recovery are satisfied with high probability. We conclude with simulation results on synthetic 
matrix decomposition problems. 
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1. Introduction. Complex systems and models arise in a variety of problems in 
science and engineering. In many applications such complex systems and models are 
often composed of multiple simpler systems and models. Therefore, in order to better 
understand the behavior and properties of a complex system a natural approach is to 
decompose the system into its simpler components. In this paper we consider matrix 
representations of systems and statistical models in which our matrices are formed by 
adding together sparse and low-rank matrices. We study the problem of recovering the 
sparse and low-rank components given no prior knowledge about the sparsity pattern 
of the sparse matrix, or the rank of the low-rank matrix. We propose a tractable 
convex program to recover these components, and provide sufficient conditions under 
which our procedure recovers the sparse and low-rank matrices exactly. 

Such a decomposition problem arises in a number of settings, with the sparse and 
low-rank matrices having different interpretations depending on the application. In 
a statistical model selection setting, the sparse matrix can correspond to a Gaussian 
graphical model [18_ and the low-rank matrix can summarize the effect of latent, 
unobserved variables. Decomposing a given model into these simpler components is 
useful for developing efficient estimation and inference algorithms. In computational 
complexity, the notion of matrix rigidity [27^ captures the smallest number of entries 
of a matrix that must be changed in order to reduce the rank of the matrix below a 
specified level (the changes can be of arbitrary magnitude) . Bounds on the rigidity of 
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a matrix have several implications in complexity theory [19]. Similarly, in a system 
identification setting the low-rank matrix represents a system with a small model 
order while the sparse matrix represents a system with a sparse impulse response. 
Decomposing a system into such simpler components can be used to provide a simpler, 
more efficient description. 

1.1. Our results. Formally the decomposition problem we are interested can 
be defined as follows: 

Problem. Given C — A* + B* where A* is an unknown sparse matrix and B* is an 
unknown low-rank matrix, recover A* and B* from C using no additional information 
on the sparsity pattern and/or the rank of the components. 

In the absence of any further assumptions, this decomposition problem is funda- 
mentally ill-posed. Indeed, there are a number of scenarios in which a unique splitting 
of C into "low-rank" and "sparse" parts may not exist; for example, the low-rank 
matrix may itself be very sparse leading to identifiability issues. In order to char- 
acterize when such a decomposition is possible we develop a notion of rank-sparsity 
incoherence, an uncertainty principle between the sparsity pattern of a matrix and 
its row/column spaces. This condition is based on quantities involving the tangent 
spaces to the algebraic variety of sparse matrices and the algebraic variety of low-rank 
matrices |16j . 

Two natural identifiability problems may arise. The first one occurs if the low- 
rank matrix itself is very sparse. In order to avoid such a problem we impose certain 
conditions on the row/column spaces of the low-rank matrix. Specifically, for a matrix 
M let T{M) be the tangent space at M with respect to the variety of all matrices with 
rank less than or equal to rank(M). Operationally, T{M) is the span of all matrices 
with row-space contained in the row-space of M or with column-space contained in 



the column-space of M; see (3.2 1 for a formal characterization. Let ^(M) be defined 
as follows: 

e(M) ^ max \\N\\oo. (1.1) 

Here || • || is the spectral norm (i.e., the largest singular value), and || • ||oo denotes 
the largest entry in magnitude. Thus £,{M) being small implies that (appropriately 
scaled) elements of the tangent space T{M) are "diffuse", i.e., these elements are 
not too sparse; as a result M cannot be very sparse. As shown in Proposition |4] (see 



Section 4.3 1 a low-rank matrix M with row/column spaces that are not closely aligned 
with the coordinate axes has small £,{M). 

The other identifiability problem may arise if the sparse matrix has all its support 
concentrated in one column; the entries in this column could negate the entries of the 
corresponding low-rank matrix, thus leaving the rank and the column space of the 
low-rank matrix unchanged. To avoid such a situation, we impose conditions on the 
sparsity pattern of the sparse matrix so that its support is not too concentrated in 
any row/column. For a matrix M let ^1{M) be the tangent space at M with respect 
to the variety of all matrices with number of non-zero entries less than or equal to 
|support(M)|. The space U{M) is simply the set of all matrices that have support 



contained within the support of M; see (3.4). Let /i(M) be defined as follows 



a(M) ^ max IIA^II. (1.2) 

Nen{M), \\N\\^<i 

The quantity fJ.{M) being small for a matrix implies that the spectrum of any element 
of the tangent space ^1{M) is "diffuse", i.e., the singular values of these elements are 
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not too large. We show m Proposition |3] (see Section [4!3| that a sparse matrix M with 
"bounded degree" (a small number of non-zeros per row/column) has small /x(M). 

For a given matrix M, it is impossible for both quantities ^(M) and fJ.{M) to be 
simultaneously small. Indeed, we prove that for any matrix M ^ we must have 
that £^{M)ijl{M) > 1 (see Theorem [l] in Section 3.3 1. Thus, this uncertainty principle 



asserts that there is no non-zero matrix M with all elements in T{M) being diffuse 
and all elements in ri(M) having diffuse spectra. As we describe later, the quantities 
^ and /X are also used to characterize fundamental identifiability in the decomposition 
problem. 

In general solving the decomposition problem is NP-hard; hence, we consider 
tractable approaches employing recently well-studied convex relaxations. We formu- 
late a convex optimization problem for decomposition using a combination of the 
norm and the nuclear norm. For any matrix M the ti norm is given by 

||A/||i = ^|M,J, 

and the nuclear norm, which is the sum of the singular values, is given by 

||M|U=5]a,(M), 

k 

where {crfc(M)} are the singular values of M. The £i norm has been used as an 
effective surrogate for the number of non-zero entries of a vector, and a number of 
results provide conditions under which this heuristic recovers sparse solutions to ill- 
posed inverse problems [lOj . More recently, the nuclear norm has been shown to be 
an effective surrogate for the rank of a matrix [T3] . This relaxation is a generalization 
of the previously studied trace-heuristic that was used to recover low-rank positive 
semidefinite matrices [20. . Indeed, several papers demonstrate that the nuclear norm 
heuristic recovers low-rank matrices in various rank minimization problems |22l |4]. 
Based on these results, we propose the following optimization formulation to recover 
A* and B* given C = A* + B*: 

(i,B)-argmin ^\\A\\^ + \\B\l 

(1.3) 

s.t. A + B = C. 

Here 7 is a parameter that provides a trade-off between the low-rank and sparse 
components. This optimization problem is convex, and can in fact be rewritten as a 
semidefinite program (SDP) [28J (see Appendix IaI) 



We prove that {A,B) = {A*^B*) is the unique optimum of (1.3 1 for a range 
of 7 if fj,{A*)^{B*) < i (see Theorem [2] in Section 4.2 1. Thus, the conditions for 
exact recovery of the sparse and low-rank components via the convex program (1.3 1 



involve the tangent-space-based quantities defined in (1.1) and (1.2 1. Essentially 
these conditions specify that each element of U{A*) must have a diffuse spectrum, 
and every element of T{B*) must be diffuse. In a sense that will b e m ade precise 



later, the condition ^{A*)£^{B*) < g required for the convex program (1.3) to provide 
exact recovery is slightly tighter than that required for fundamental identifiability in 
the decomposition problem. An important feature of our result is that it provides a 
simple deterministic condition for exact recovery. In addition, note that the conditions 
only depend on the row/column spaces of the low-rank matrix B* and the support of 
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the sparse matrix A* , and not the singular values of B* or the values of the non-zero 
entries of A*. The reason for this is that the non-zero entries of A* and the singular 
values of B* play no role in the subgradient conditions with respect to the £i norm 
and the nuclear norm. 

In the sequel we discuss concrete classes of sparse and low-rank matrices that 
have small ^ and ^ respectively. We also show that when the sparse and low-rank 
matrices A* and B* are drawn from certain natural random ensembles, then the 
sufficient conditions of Theorem [2] are satisfied with high probability; consequently, 



(1.3) provides exact recovery with high probability for such matrices. 



1.2. Previous work using incoherence. The concept of incoherence was stud- 
ied in the context of recovering sparse representations of vectors from a so-called 
"overcomplete dictionary" [9]. More concretely consider a situation in which one is 
given a vector formed by a sparse linear combination of a few elements from a com- 
bined time- frequency dictionary, i.e., a vector formed by adding a few sinusoids and 
a few "spikes" ; the goal is to recover the spikes and sinusoids that compose the vec- 
tor from the infinitely many possible solutions. Based on a notion of time-frequency 
incoherence, the £i heuristic was shown to succeed in recovering sparse solutions [8]. 
Incoherence is also a concept that is implicitly used in recent work under the title of 
compressed sensing, which aims to recover "low-dimensional" objects such as sparse 
vectors [21 [TT] and low-rank matrices |22l H] given incomplete observations. Our work 
is closer in spirit to that in [S] , and can be viewed as a method to recover the "simplest 
explanation" of a matrix given an "overcomplete dictionary" of sparse and low-rank 
matrix atoms. 

1.3. Outline. In Section [2] we elaborate on the applications mentioned previ- 
ously, and discuss the implications of our results for each of these applications. Sec- 
tion [3] formally describes conditions for fundamental identifiability in the decompo- 



sition problem based on the quantities ^ and /i defined in (1.1 1 and (1.2 1. We also 
provide a proof of the rank-sparsity uncertainty principle of Theorem [Ij We prove 
Theorem |2] in Section |4] and also provide concrete classes of sparse and low-rank 
matrices that satisfy the sufficient conditions of Theorem |2] Section [5] describes the 
results of simulations of our approach applied to synthetic matrix decomposition prob- 
lems. We conclude with a discussion in Section |6] The Appendix provides additional 
details and proofs. 

2. Applications. In this section we describe several applications that involve 
decomposing a matrix into sparse and low-rank components. 

2.1. Graphical modeling with latent variables. We begin with a problem in 
statistical model selection. In many applications large covariance matrices are approx- 
imated as low-rank matrices based on the assumption that a small number of latent 
factors explain most of the observed statistics (e.g., principal component analysis). 
Another well-studied class of models are those described by graphical models 1^8^ in 
which the inverse of the covariance matrix (also called the precision or concentration 
or information matrix) is assumed to be sparse (typically this sparsity is with respect 
to some graph). We describe a model selection problem involving graphical models 
with latent variables. Let the covariance matrix of a collection of jointly Gaussian 
variables be denoted by S(o /j), where o represents observed variables and h represents 
unobserved, hidden variables. The marginal statistics corresponding to the observed 
variables o are given by the marginal covariance matrix So, which is simply a sub- 
matrix of the full covariance matrix f^y Suppose, however, that we parameterize 
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our model by the information matrix given by K(^o ^) = ^(o^/j) (such a parameteri- 
zation reveals the connection to graphical models). In such a parameterization, the 
marginal information matrix corresponding to the inverse S^^ is given by the Schur 
complement with respect to the block K^: 

Ko - =Ko- KoMK^'Kh^o- (2.1) 

Thus if we only observe the variables o, we only have access to (or Kq)- A simple 
explanation of the statistical structure underlying these variables involves recognizing 
the presence of the latent, unobserved variables h. However ( |2.1[ ) has the interesting 
structure that Ko is often sparse due to graphical structure amongst the observed 
variables o, while Ko^hK^^ Kh.o has low-rank if the number of latent, unobserved 
variables h is small relative to the number of observed variables o (the rank is equal 
to the number of latent variables h). Therefore, decomposing Kg into these sparse 
and low-rank components reveals the graphical structure in the observed variables as 
well as the effect due to (and the number of) the unobserved latent variables. We 
discuss this application in more detail in a separate report [6]. 

2.2. Matrix rigidity. The rigidity of a matrix M, denoted by RM{k), is the 
smallest number of entries that need to be changed in order to reduce the rank of 
M below k. Obtaining bounds on rigidity has a number of implications in complex- 
ity theory [191 , such as the trade-offs between size and depth in arithmetic circuits. 
However, computing the rigidity of a matrix is in general an NP-hard problem [7]. 
For any M G M"^" one can check that Ruik) < {n — k)'^ (this follows directly 
from a Schur complement argument). Generically every M G E"^" is very rigid, i.e., 
Rhiik) — {n — fc)^ ^T, although special classes of matrices may be less rigid. We 
show that the SDP ( |1.3| can be used to compute rigidity for certain matrices with 



sufficiently small rigidity (see Section 4.4 for more details). Indeed, this convex pro- 



gram ( 1.3 1 also provides a certificate of the sparse and low-rank components that form 



such low-rigidity matrices; that is, the SDP (1.3) not only enables us to compute the 
rigidity for certain matrices but additionally provides the changes required in order 
to realize a matrix of lower rank. 

2.3. Composite system identification. A decomposition problem can also 
be posed in the system identification setting. Linear time-invariant (LTI) systems 
can be represented by Hankel matrices, where the matrix represents the input-output 
relationship of the system |25j . Thus, a sparse Hankel matrix corresponds to an LTI 
system with a sparse impulse response. A low-rank Hankel matrix corresponds to a 
system with small model order, and provides a minimal realization for a system [T^ . 
Given an LTI system H as follows 



H = H, + Hi 



where Hs is sparse and Hir is low-rank, obtaining a simple description of H requires 
decomposing it into its simpler sparse and low-rank components. One can obtain 
these components by solving our rank-sparsity decomposition problem. Note that in 



practice one can impose in ( 1.3 1 the additional constraint that the sparse and low-rank 



matrices have Hankel structure. 



2.4. Partially coherent decomposition in optical systems. We outline an 
optics application that is described in greater detail in |12j . Optical imaging systems 
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are commonly modeled using the Hopkins integral [15^ , which gives the output inten- 
sity at a point as a function of the input transmission via a quadratic form. In many 
applications the operator in this quadratic form can be well-approximated by a (finite) 
positive semi-definite matrix. Optical systems described by a low-pass filter are called 
coherent imaging systems, and the corresponding system matrices have small rank. 
For systems that are not perfectly coherent various methods have been proposed to 
find an optimal coherent decomposition [21 , and these essentially identify the best 
approximation of the system matrix by a matrix of lower rank. At the other end axe 
incoherent optical systems that allow some high frequencies, and are characterized 
by system matrices that are diagonal. As most real-world imaging systems are some 
combination of coherent and incoherent, it was suggested in [12] that optical systems 
are better described by a sum of coherent and incoherent systems rather than by the 
best coherent (i.e., low-rank) approximation as in |21j . Thus, decomposing an imaging 
system into coherent and incoherent components involves splitting the optical system 
matrix into low-rank and diagonal components. Identifying these simpler components 
has important applications in tasks such as optical microlithography \21\ I15j . 



3. Rank-Sparsity Incoherence. Throughout this paper, we restrict ourselves 
to square n x n matrices to avoid cluttered notation. All our analysis extends to 
rectangular rii x n2 matrices, if we simply replace n by max(ni, 71,2). 

3.1. Identifiability issues. As described in the introduction, the matrix de- 
composition problem can be fundamentally ill-posed. We describe two situations in 
which identifiability issues arise. These examples suggest the kinds of additional con- 
ditions that are required in order to ensure that there exists a unique decomposition 
into sparse and low-rank matrices. 

First, let A* be any sparse matrix and let B* — e^ej, where represents the i-th 
standard basis vector. In this case, the low-rank matrix B* is also very sparse, and a 
valid sparse-plus- low-rank decomposition might he A = A* + CicJ and 13 — 0. Thus, 
we need conditions that ensure that the low-rank matrix is not too sparse. One way 
to accomplish this is to require that the quantity £,{B*) be small. As will be discussed 



in Section 4.3 1, if the row and column spaces of B* axe "incoherent" with respect to 
the standard basis, i.e., the row/column spaces are not aligned closely with any of the 
coordinate axes, then £,{B*) is small. 

Next, consider the scenario in which B* is any low-rank matrix and A* = —vej 
with V being the first column of B*. Thus, C = A* + B* has zeros in the first 
column, rank(C) = rank(i?*), and C has the same column space as B*. Therefore, 
a reasonable sparse-plus-low-rank decomposition in this case might he B = B* + A* 
and A = 0. Here rank(i3) = rank(i3*). Requiring that a sparse matrix A* have 



small fJ.{A^) avoids such identifiability issues. Indeed we show in Section 4.3 that 
sparse matrices with "bounded degree" (i.e., few non-zero entries per row/column) 
have small ^. 

3.2. Tangent-space identifiability. We begin by describing the sets of sparse 
and low-rank matrices. These sets can be considered either as differentiable mani- 
folds (away from their singularities) or as algebraic varieties; we emphasize the latter 
viewpoint here. Recall that an algebraic variety is defined as the zero set of a system 
of polynomial equations [16j . The variety of rank-constrained matrices is defined as: 

r{k) ^ {M e E"^" I rank(Af) < k}. (3.1) 
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This is an algebraic variety since it can be defined through the vanishing of all (fc + 
1) X (fc + 1) minors of the matrix M . The dimension of this variety is k{2n — fc), 
and it is non-singular everywhere except at those matrices with rank less than or 
equal to fc — 1. For any matrix M e M"^", the tangent space T{M) with respect 
to 7'(rank(M)) at M is the span of all matrices with either the same row-space as 
M or the same column-space as M. Specifically, let M = U'SV'^ be a singular value 
decomposition (SVD) of M with U,V G M"^*^, where rank(M) — k. Then we have 
that 

T{M) = {UX^ + YV^ \ X,Y eW""}. (3.2) 

If rank(Af) = k the dimension of T{M) is k{2n — k). Note that we always have 
M e T{M). In the rest of this paper we view T{M) as a subspace in M"^". 

Next we consider the set of all matrices that are constrained by the size of their 
support. Such sparse matrices can also be viewed as algebraic varieties: 

5(m) ^ {M e M"^" I |support(M)| < m}. (3.3) 

The dimension of this variety is m, and it is non-singular everywhere except at those 
matrices with support size less than or equal to to — 1. In fact S{m) can be thought 
of as a union of (^) subspaces, with each subspace being aligned with m of the n? 
coordinate axes. For any matrix M S E"^", the tangent space ri(M) with respect to 
5(|support(M)|) at M is given by 

fl{M) = {N e M"""" I support(A^) C support(A/)}. (3.4) 

If |support(M)| = TO the dimension of ri(Af) is m. Note again that we always have 
M G n{M). As with T(M), we view n{M) as a subspace in M"^". Since both T{M) 
and ri(M) are subspaces of E"^", we can compare vectors in these subspaces. 

Before analyzing whether (A* , B*) can be recovered in general (for example, using 
the SDP ( |1.3[ )), we ask a simpler question. Suppose that we had prior information 
about the tangent spaces Q.{A*) and T{B*), in addition to being given C — A* + B* . 
Can we then uniquely recover {A*,B*) from CI Assuming such prior knowledge of 
the tangent spaces is unrealistic in practice; however, we obtain useful insight into the 
kinds of conditions required on sparse and low-rank matrices for exact decomposition. 
Given this knowledge of the tangent spaces, a necessary and sufficient condition for 
unique recovery is that the tangent spaces Q,{A*) and T{B*) intersect transversally: 

VL{A*) n T{B*) ^ {0}. 

That is, the subspaces Q.{A*) and T{B*) have a trivial intersection. The sufficiency of 
this condition for unique decomposition is easily seen. For the necessity part, suppose 
for the sake of a contradiction that a non-zero matrix M belongs to ^{A*) n T{B*); 
one can add and subtract M from A* and B* respectively while still having a valid 
decomposition, which violates the uniqueness requirement. The following proposition, 
proved in Appendix |Bj provides a simple condition in terms of the quantities fJ.{A*) 
and £,{B*) for the tangent spaces ^1{A*) and T{B*) to intersect transversally. 
Proposition 1. Given any two matrices A* and B* , we have that 

IJi{A*)i{B*) < 1 niA") n T{B*) = {0}, 



where £,{B*) and fJ.{A*) are defined in (1.1) and (1.2), and the tangent spaces ^{A*) 
and T{B*) are defined in and (3.^. 
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Thus, both fJ.{A*) and £,{B*) being small implies that the tangent spaces ^1{A*) 
and T{B*) intersect trans versally; consequently, we can exactly recover {A* , B*) given 
ri(A*) and T{B*). As we shall see, the condition required in Theorem [2] (see Sec- 



tion 



4.2) for exact recovery using the convex program (1.3 1 will be simply a mild 
tightening of the condition required above for unique decomposition given the tan- 
gent spaces. 

3.3. Rank-sparsity uncertainty principle. Another important consequence 
of Proposition [T] is that we have an elementary proof of the following rank-sparsity 
uncertainty principle. 

Theorem 1. For any matrix M ^ 0, we have that 

aM)KM) > 1, 



where S,{M) and IJ-{M) are as defined in (1.1) and (1.2) respectively 



Proof: Given any M 7^ it is clear that M e n{M)nT{M), i.e., M is an element 
of both tangent spaces. However fi{M)S,{M) < 1 would imply from Proposition [l] 
that ri(Af) nT(M) — {0}, which is a contradiction. Consequently, we must have that 
lJi{M)£,{M) > 1. □ 

Hence, for any matrix M ^ both ij,{M) and £,{M) cannot be simultaneously 
small. Note that Proposition [l] is an assertion involving 11 and ^ for (in general) 
different matrices, while Theorem[l]is a statement about ^ and ^ for the same matrix. 
Essentially the uncertainty principle asserts that no matrix can be too sparse while 
having "diffuse" row and column spaces. An extreme example is the matrix CjcJ, 
which has the property that fj.{eiej)£_{eiej) — 1. 

4. Exact Decomposition Using Semidefinite Programming. We begin 



this section by studying the optimality conditions of the convex program (1.3 1, after 
which we provide a proof of Theorem [2] with simple conditions that guarantee exact 
decomposition. Next we discuss concrete classes of sparse and low-rank matrices that 



satisfy the conditions of Theorem [2] and can thus be uniquely decomposed using ( 1.3 1 



4.1. Optimality conditions. The orthogonal projection onto the space U{A*) 
is denoted -Po(A*)i which simply sets to zero those entries with support not inside 
support(yl*). The subspace orthogonal to U{A*) is denoted U{A*)'^, and it consists 
of matrices with complementary support, i.e., supported on support (^*)'^. The pro- 
jection onto ri(A*)^ is denoted Pn^A*)"- 

Similarly the orthogonal projection onto the space T{B*) is denoted Pt{b*)- Let- 
ting B* = U^V^ be the SVD of B* , we have the following explicit relation for 



Pt(b* 



Pt(b*){M) = PijM + MPv - PuMPv. (4.1) 



Here Pu = UU'^ and Py = VV^ . The space orthogonal to T{B*) is denoted T{B*)^, 
and the corresponding projection is denoted Pxi^b'-)^{M). The space T{B*)^ con- 
sists of matrices with row-space orthogonal to the row-space of B* and column-space 
orthogonal to the column-space of B* . We have that 

Pt(b*)^{M) - (/„x„ - Pc/)A/(/„x„ - Pv), (4.2) 
where /„xn is the n x n identity matrix. 
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T{B*) 




Fig. 4.1. Geometric representation of optimality conditions: Existence of a dual Q. The ar- 
rows denote orthogonal projections - every projection must satisfy a condition ( according to Propo- 
sitionVw, which is described next to each arrow. 



Following standard notation in convex analysis |24j . we denote the suhgradient 
of a convex function / at a point x in its domain by df{x). The subgradient df{x) 
consists of all y such that 

f{x)>f{x) + {y,x-x), Vx. 

From the optimality conditions for a convex program jT], we have that (A*, B*) is an 
optimum of (1.3 1 if and only if there exists a dual Q S K"^" such that 



e ld\\A*\\i and Q e d\\B*\ 



(4.3) 



From the characterization of the subgradient of the £i norm, we have that Q G 
-fd\\A*\\i if and only if 



PniA*){Q) = 7sign(A^), ||Pa(A.)^(Q)l|oo < 7- 



(4.4) 



Here sign(A*^) equals +1 if A* ^ > 0, -1 if A* ^ < 0, and if A* ^ = 0. We also have 
that Q e d\\B*\\^ if and only if [21] 

PnB*)iQ) = UV, |1Pt(s*)4Q)II < 1- (4.5) 
Note that these are necessary and sufficient conditions for {A*, B*) to be an optimum 



of (1.3). The following proposition provides sufficient conditions for (A*,B*) to be 



the unique optimum of ( 1.3 I, and it involves a slight tightening of the conditions (4.3 1 
(l^ll), and (lisl). 



Proposition 2. Suppose that C = A* + B*. Then {A,B) = {A*,B*) is the 
unique optimizer of (1.3) if the following conditions are satisfied: 



1. n{A*)nT{B*) = {{)}. 

2. There exists a dual Q g M"^" such that 
(a) P^^s^^iQ) = UV 

(h) Pn(^.)(g)=7sign(A*) 
(c) |1Pt(b*)-(Q)II <i 
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(d) \\PniA^riQ)\U < 1 
The proof of the proposition can be found in Appendix |B] Figure |4.1| provides a 
visual representation of these conditions. In particular, we see that the spaces ^1{A*) 
and T{B*) intersect transversely (part (1) of Proposition |2| . One can also intuitively 
see that guaranteeing the existence of a dual Q with the requisite conditions (part 
(2) of Proposition [2]) is perhaps easier if the intersection between D,{A*) and T{B*) 
is more transverse. Note that condition (1) of this proposition essentially requires 
identifiability with respect to the tangent spaces, as discussed in Section [33] 



4.2. SufRcient conditions based on /i(A*) and ^{B*). Next we provide 
simple sufficient conditions on A* and B* that guarantee the existence of an ap- 
propriate dual Q (as required by Proposition |2| . Given matrices A* and B* with 
IJ.{A*)^{B*) < 1, we have from Proposition [l] that 0(A*) nT{B*) = {0}, i.e., condi- 
tion (1) of Proposition [2] is satisfied. We prove that if a slightly stronger condition 
holds, there exists a dual Q that satisfies the requirements of condition (2) of Propo- 
sition [2l 

Theorem 2. Given C = A* + B* with 

ti{A*)i{B*) < i 



the unique optimum (A, 13) of (1.3) is {A*^B*) for the following range ofj: 



7 e 



1 -4/^(A*)^(B*)' ^(A*) 

Specifically 7 = \J 2^f!'(A*\ always inside the above range, and thus guarantees exact 
recovery of {A* , B*). 

The proof of this theorem can be found in Appendix |B] The main idea behind 
the proof is that we only consider candidates for the dual Q that lie in the direct 
sum fl{A*) T{B*) of the tangent spaces. Since ii{A*)^{B*) < we have from 
Proposition [T|that the tangent spaces il,{A'^) and T{B*) have a transverse intersection, 
i.e., n{A*)rT{B*) = {0}. Therefore, there exists a unique element Q e n{A*)®T{B*) 
that satisfies Pt(b*){Q) — UV and Pn(A*){Q) = 7sign(^*). The proof proceeds by 
showing that if fj,{A*)S^{B*) < ^ then the projections of this Q onto the orthogonal 
spaces ^{A*Y and T{B*)^ are small, thus satisfying condition (2) of Proposition [2] 

Remarks. One consequence of Theorem |2]is that if fi{A*)^{B*) < |, then there 
exists no other {A, B) such that A+B = A*+B* with yL{A)^{B) < i. We consider this 
implication locally around {A*^B*). Recall that the quantities IJ.{A*) and £,{B*) are 
defined with respect to the tangent spaces Q.{A*) and T{B*). Suppose B* is slightly 
perturbed along the variety of rank-constrained matrices to some B. This ensures 
that the tangent space varies smoothly from T{B*) to T{B), and consequently that 
^{B) w ^{B*). However, compensating for this by changing A* to A* + {B* — B) moves 
A* outside the variety of sparse matrices. This is because B* — B is not sparse. Thus 
the dimension of the tangent space U{A* + B* — B) is much greater than that of the 
tangent space il.{A*), as a result of which fi{A* + B* — B) ^ fi{A*); therefore we have 
that £,{B)fj,{A* + B* — B) ^ g. The same reasoning holds in the opposite scenario. 
Consider perturbing A* slightly along the variety of sparse matrices to some A. While 
this ensures that ij,{A) « fJ-{A*), changing B* to B* + {A* — A) moves B* outside the 
variety of rank-constrained matrices. Therefore the dimension of the tangent space 
T{B* + A* -A) is greater than that of T{B*), resulting in ^(S* + A* - A) > i{B*); 
consequently we have that ii{A)S_{B* + A* — A) ^. 
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4.3. Sparse and low-rank matrices with ii{A*)£,{B*) < g. We discuss con- 
crete classes of sparse and low-rank matrices that satisfy the sufficient condition of 
Theorem |2] for exact decomposition. We begin by showing that sparse matrices with 
"bounded degree", i.e., bounded number of non- zeros per row/column, have small /i. 

Proposition 3. Let A e M"^" be any matrix with at most deg^^^^{A) non-zero 
entries per row/column, and with at least degj^^^^{A) non-zero entries per row/column. 
With n{A) as defined in (1-2), we have that 

deg„,in(A) < ^l{A) < deg„,,,(^). 



See Appendix |B] for the proof. Note that if A e M"^" has full support, i.e., 
ri(v4) — M"^", then fi{A) = n. Therefore, a constraint on the number of zeros per 
row/column provides a useful bound on /i. We emphasize here that simply bounding 
the number of non-zero entries in A does not suffice; the sparsity pattern also plays a 
role in determining the value of 

Next we consider low-rank matrices that have small ^. Specifically, we show that 
matrices with row and column spaces that are incoherent with respect to the standard 
basis have small ^. We measure the incoherence of a subspace S C M" as follows: 

(3{S) ^ max\\Pse,h, (4.6) 

where e,j is the i'th standard basis vector, Pg denotes the projection onto the subspace 
S, and II • II 2 denotes the vector £2 norm. This definition of incoherence also played an 
important role in the results in [4 . A small value of P{S) implies that the subspace S 
is not closely aligned with any of the coordinate axes. In general for any /c-dimensional 
subspace S, we have that 

^ < f3{S) < 1, 
n 

where the lower bound is achieved, for example, by a subspace that spans any k 
columns of an n x n orthonormal Hadamard matrix, while the upper bound is achieved 
by any subspace that contains a standard basis vector. Based on the definition of (3(S), 
we define the incoherence of the row/column spaces of a matrix B e M"^" as 

inc(i3) = max{/3(row-space(i3)), /3(column-space(i?))}. (4-7) 

If the SVD oi B = IfSV^ then row-space(i3) — span{V) and column-space (B) = 
span([/). We show in Appendix [b| that matrices with incoherent row/column spaces 
have small ^; the proof technique for the lower bound here was suggested by Ben 
Recht I23l. 



Proposition 4. Let B e M"^" be any matrix with mc{B) defined as in {4-'^h 



and ^{B) defined as in (1.1). We have that 

mc{B) < ^{B) < 2 mc{B). 



If B E M"^" is a full-rank matrix or a matrix such as eief , then ^{B) — 1. 
Therefore, a bound on the incoherence of the row/column spaces of B is important 
in order to bound ^. Using Propositions [3] and [4] along with Theorem [2] we have the 
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following corollary, which states that sparse bounded-degree matrices and low-rank 
matrices with incoherent row/column spaces can be uniquely decomposed. 

Corollary 3. Let C = A* + B* with deg^^^(A*) being the maximum number of 
nonzero entries per row/column of A* and inc(_B*) being the maximum incoherence 



of the row/column spaces of B* (as defined by If we have that 

deg„,ax(^*) inc(B*) < ^, 



then the unique optimum of the convex program (1.3) is {A^B) ~ {A* ,B*) for a range 
of values of j: 

( 2 inc(i?*) 1 - 6 deg.„aJA*) inc(i3*) \ 

Vl-8deg„,,(A*) inc(B*)' deg„,,(A*) 

Specifically 7 = \/ dct^'"^*'|^*) always inside the above range, and thus guarantees 
exact recovery of{A*,B*). 

We emphasize that this is a result with deterministic sufficient conditions on exact 
decomposability. 

4.4. Decomposing random sparse and low-rank matrices. Next we show 
that sparse and low-rank matrices drawn from certain natural random ensembles 
satisfy the sufficient conditions of Corollary |3] with high probability. We first consider 
random sparse matrices with a fixed number of non-zero entries. 

Random sparsity model. The matrix A* is such that support(74*) is chosen uni- 
formly at random from the collection of all support sets of size m. There is no 
assumption made about the values of A* at locations specified by support (A*). 

Lemma 1. Suppose that A* £ M"^" is drawn according to the random sparsity 
model with m non-zero entries. Let deg^^^{A*) be the maximum number of non-zero 
entries in each row/ column of A* . We have that 

Tfl 

deg„iax(^*) < -log(n), 
n 

with high probability. 

The proof of this lemma follows from a standard balls and bins argument, and 
can be found in several references (see for example 

Next we consider low-rank matrices in which the singular vectors are chosen uni- 
formly at random from the set of all partial isometrics. Such a model was considered 
in recent work on the matrix completion problem |4] , which aims to recover a low-rank 
matrix given observations of a subset of entries of the matrix. 

Random orthogonal model A rank-A: matrix B* G M"^" with SVD B* = 
UTiV' is constructed as follows: The singular vectors U,V £ M"^*^ are drawn uniformly 
at random from the collection of rank-fc partial isometrics in M"^*^. The choices of 
U and V need not be mutually independent. No restriction is placed on the singular 
values. 

As shown in low-rank matrices drawn from such a model have incoherent 
row/column spaces. 

Lemma 2. Suppose that a rank-k matrix B* G M"^" is drawn according to 



the random orthogonal model. Then we have that that 'mc(B*) (defined by (4.1)) is 
bounded as 



inc(B*)< ./™^('='log(")) 
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with very high probability. 

Applying these two results in conjunction with Corollary [3] we have that sparse 
and low-rank matrices drawn from the random sparsity model and the random or- 
thogonal model can be uniquely decomposed with high probability. 

Corollary 4. Suppose that a rank-k matrix B* G M"^" is drawn from the 
random orthogonal model, and that A* G M"^" is drawn from the random sparsity 
model with m n on-z ero entries. Given C ~ A* + B* , there exists a range of values 



for 7 (given by (4-. 8)) so that {A^B) = {A*,B*) is the unique optimum of the SDP 



(1.3) with high probability provided 

1.5 



n 



log n-y/max(fc, log n) 



Thus, for matrices B* with rank k smaller than n the SDP (1.3) yields exact 
recovery with high probability even when the size of the support of A* is super-linear 
in n. During final preparation of this manuscript we learned of related contempora- 
neous work [3D] that specifically studies the problem of decomposing random sparse 
and low-rank matrices. In addition to the assumptions of our random sparsity and 
random orthogonal models, |3D] also requires that the non-zero entries of A* have 
independently chosen signs that are ±1 with equal probability, while the left and 
right singular vectors of B* are chosen independent of each other. For this particular 
specialization of our more general framework, the results in [30] improve upon our 
bound in Corollary |4] 

Implications for the matrix rigidity problem. Corollary [4] has implications for the 
matrix rigidity problem discussed in Section |2] Recall that RM{k) is the smallest 
number of entries of M that need to be changed to reduce the rank of M below k (the 
changes can be of arbitrary magnitude). A generic matrix M € R"^" has rigidity 
Ruik) = {n— k)^ [57]. However, special structured classes of matrices can have low 
rigidity. Consider a matrix M formed by adding a sparse matrix drawn from the 
random sparsity model with support size and a low-rank matrix drawn from 

the random orthogonal model with rank en for some fixed e > 0. Such a matrix has 
rigidity RM{en) — and one can recover the sparse and low-rank components 



that compose M with high probability by solving the SDP (1.3). To see this, note 
that 



logn ^ logn^max(en,log7i) logn^/en' 

which satisfies the sufficient condition of Corollary |4] for exact recovery. Therefore, 
while the rigidity of a matrix is NP-hard to compute in general [7l , for such low-rigidity 



matrices M one can compute the rigidity RMi^n); in fact the SDP (1.3 1 provides a 



certificate of the sparse and low-rank matrices that form the low rigidity matrix M . 

5. Simulation Results. We confirm the theoretical predictions in this paper 
with some simple experimental results. We also present a heuristic to choose the 
trade-off parameter 7. All our simulations were performed using YALMIP [31 and 
the SDPT3 software [26J for solving SDPs. 

In the first experiment we generate random 25 x 25 matrices according to the 
random sparsity and random orthogonal models described in Section [4. 4[ To generate 
a random rank-fc matrix B* according to the random orthogonal model, we generate 
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k [- rank(B-)] 



Fig. 5.1. For each value ofm, k, we generate 25 X 25 random m-sparse A* and random rank-k 
B* and attempt to recover {A*,B*) from C = A* + B* using . 3^ . For each value of m,k we 
repeat ed th is procedure 10 times. The figure shows the probability of success in recovering (A*,B*) 
using for various values ofm and k. White represents a probability of success ofl, while black 

represents a probability of success of 0. 




Fig. 5.2. Comparison between tolt and difft for a randomly generated example with n = 25, m - 
25, k = 2. 



X,Y E ]]j25xfe i.i.d. Gaussian entries and set B* — XY^. To generate an m- 
sparse matrix A* according to the random sparsity model, we choose a support set 
of size m uniformly at random and the values within this support are i.i.d. Gaussian. 
The goal is to recover (A*, B*) from C = A* + B* using the SDP Let to^ be 

defined as: 



tol^ = 



\A-A*\ 



\B - B*\ 
\\B*\\p 



(5.1) 



where {A,B) is the solution of (1.3 1, and || • \\f is the Frobenius norm. We declare 
success in recovering {A*,B*) if tol-, < 10^'^. (We discuss the issue of choosing 7 
in the next experiment.) Figure 5.1 shows the success rate in recovering {A*,B*) 
for various values of m and k (averaged over 10 experiments for each m,k). Thus 
we see that one can recover sufficiently sparse A* and sufficiently low-rank B* from 
C = A* + B* using (O). 



Next we consider the problem of choosing the trade-off parameter 7. Based on 
Theorem |2] we know that exact recovery is possible for a range of 7. Therefore, one 
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can simply check the stabihty of the solution (A, _B) as 7 is varied without knowing 
the appropriate range for 7 in advance. To formalize this scheme we consider the 



following SDP for t e [0, 1], which is a slightly modified version of (1.3l: 



{At,Bt) = argmin t\\A\\i + (1 - t)\\B\\ 

A..B 



s.t. 



A + B^C. 



(5.2) 



There is a one-to-one correspondence between (1.3 1 and (5.2 1 given hy t — 



The benefit in looking at (5.2) is that the range of valid parameters is compact, i.e 



t e [0,1], as opposed to the situation in (1.3 1 where 7 € [0,cx)). We compute the 
difference between solutions for some t and i — e as follows: 



diff, = - AtWp) + {\\Bt-, - BtWp), 



(5.3) 



where e > is some small fixed constant, say e — 0.01. We generate a random 



A* e 



^25x25 ^j^]^ rank = 2 as described 



B* , we solve (5.2) for various values of t. Figure 5.2 shows 



25x25 jg 25-sparse and a random B* S 
above. Given C — A* 
two curves - one is tolj (which is defined analogous to tol-y in (5.1 1) and the other is 
difft. Clearly we do not have access to toli in practice. However, we see that diffj 
i s ne ar-zero in exactly three regions. For sufficiently small t the optimal solution to 
(5.2) is {At,Bt) = {A* + B*,0), while for sufficiently large t the optimal solution is 
{At,Bt) = (0,^* -t- B*). As seen in the figure, difft stabilizes for small and large t. 
The third "middle" range of stability is where we typically have {At, Bt) = {A* , B*). 
Notice that outside of these three regions difft is not close to and in fact changes 
rapidly. Therefore if a reasonable guess for t (or 7) is not available, one could solve 
(5.2) for a range of t and choose a solution corresponding to the "middle" range in 
which difft is stable and near zero. A related method to check for stability is to 
compute the sensitivity of the cost of the optimal solution with respect to 7, which 
can be obtained from the dual solution. 

6. Discussion. We have studied the problem of exactly decomposing a given 
matrix C = A* + B* into its sparse and low-rank components A* and B* . This 
problem arises in a number of applications in model selection, system identification, 
complexity theory, and optics. We characterized fundamental identifiability in the 
decomposition problem based on a notion of rank-sparsity incoherence, which relates 
the sparsity pattern of a matrix and its row/column spaces via an uncertainty prin- 
ciple. As the general decomposition problem is NP-hard we propose a natural SDP 



relaxation (1.3) to solve the problem, and provide sufficient conditions on sparse and 
low-rank matrices so that the SDP exactly recovers such matrices. Our sufficient 
conditions are deterministic in nature; they essentially require that the sparse matrix 
must have support that is not too concentrated in any row/column, while the low-rank 
matrix must have row/column spaces that are not closely aligned with the coordinate 
axes. Our analysis centers around studying the tangent spaces with respect to the 
algebraic varieties of sparse and low-rank matrices. Indeed the sufficient conditions 
for identifiability and for exact recovery using the SDP can also be viewed as requiring 
that certain tangent spaces have a transverse intersection. We also demonstrated the 
implications of our results for the matrix rigidity problem. 

An interesting problem for further research is the development of special-purpose 



algorithms that take advantage of structure in ( 1.3 1 to provide a more efficient solution 
than a general-purpose SDP solver. Another question that arises in applications such 
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as model selection (due to noise or finite sample effects) is to approximately decompose 
a matrix into sparse and low-rank components. 

Acknowledgments. The authors would like to thank Dr. Benjamin Recht and 
Prof. Maryam Fazel for helpful discussions. 



Appendix A. SDP formulation. The problem ( 1.3 ) can be recast as a semidef- 
inite program (SDP). We appeal to the fact that the spectral norm || • || is the dual 
norm of the nuclear norm 1 1 • 1 1 * : 

||M||* = max{trace(M'r)| ||y|| < 1}. 

Further, the spectral norm admits a simple semidefinite characterization |22j : 

/ tin Y \ 
\\Y\\ = min t s.t. \ y 0. 

" " * \^ r tin J ~ 

From duality, we can obtain the following SDP characterization of the nuclear norm: 

||A/||* = min i(trace(VFi) + trace(M^2)) 
Wi,W2 2 

f Wi M \ 
s.t. , ^0. 



Putting these facts together, (1.3 1 can be rewritten as 

. u "^^^r . ^^nZln + ^(trace(M^i) + trace(iy2)) 



S.t. 



Wi 

















B' 












< 






A + B 




c. 



(A.l) 



Here, !„ € refers to the vector that has 1 in every entry. 
Appendix B. Proofs. 

Proof of Proposition [l], We begin by establishing that 

max ||Po(A*)(^)|| < 1 ^ n{A*) C^ T{B*)^{{)}, (B.l) 

Ni£T{B*), \\N\\<1 

where Pn(A*){N) denotes the projection onto the space i}{A*). Assume for the sake 
of a contradiction that this assertion is not true. Thus, there exists iV ^ such that 
N G n{A*) n T{B*). Scale N appropriately such that ||7V|| = 1. Thus N e T{B*) 
with ||iV|| = 1, but we also have that \\Pn{A*){N)\\ = ||7V|| = 1 as iV e n{A*). This 
leads to a contradiction. 
Next, we show that 

max \\PniA*){N)\\ < ^i{A*)^{B*), 

NeT{B*), \\N\\<1 \ "> 



Rank-Sparsity Incoherence for Matrix Decomposition 



17 



which would allow us to conclude the proof of this proposition. We have the following 
sequence of inequalities 

max \\Pn{A^){N)\\ < max WIU 

NeT{B*), \\N\\<1 ^ ' NeT{B'), \\N\\<1 ^ ' 

< max n{A*)\\N\\^ 

NeT{B'), \\N\\<1 



Here the first inequality follows from the definition (1.2 1 of as Psi(A*)(^) G 

Q{A*), the second inequality is due to the fact that ||-Pa(A*)(^)l|oo < ll^lloo, and the 
final inequality follows from the definition ( 1.1 1 of £,{B*). □ 

Proof of Proposition [2| We first show that {A*,B*) is an optimum of (1.3 1, 
before moving on to showing uniqueness. Based on subgradient optimality conditions 
applied at {A*, B*), there must exist a dual Q such that 

Qe7a||yl*||i and Q e d\\B*\\,. 

The second condition in this proposition guarantees the existence of a dual Q that 
satisfies both these subgradient conditions simultaneously (see (4.4 1 and ( |4.5[ )). There- 
fore, we have that {A*,B*) is an optimum. Next we show that under the conditions 
specified in the lemma, (A*,B*) is also a unique optimum. To avoid cluttered no- 
tation, in the rest of this proof we let n = n{A*), T = T{B*), f^'=(A*) = and 
T^{B*) = T-L. 

Suppose that there is another feasible solution {A* + Na, B* + Nb) that is also 
a minimizer. We must have that Na + Nb = Q because A* + B* ^ C = {A* + Na) + 
{B* + Nb)- Applying the subgradient property at {A*,B*), we have that for any 
subgradient {Qa,Qb) of the function 7||yl||i + \\B\\^ (at {A*,B*)) 

^\\A* + NaWi + \\B* + Nb\U > 7ll^1li + \\B*\U + {Qa,Na) + {Qb,Nb). (B.2) 

Since {Qa,Qb) is a subgradient of the function 7||A||i + ||S||* at {A*,B*), we must 
have from (|0) and (|43l that 

• Qa = 7sign(.4*) + Pq^Qa), with ||Poc(g^)||^ < 7. 

• Qb = UV' + Pt^{Qb), with \\Pt^{Qb)\\ < 1. 

Using these conditions we rewrite {Qa,Na) and {Qb,Nb)- Based on the existence of 
the dual Q as described in the lemma, we have that 

{Qa,Na) = {"/signiA") + Pn4QA),NA) 
= {Q-Pn4Q)+Pn4QA),NA) 

= {Pu^Qa) - Pn4Q),NA) + {Q..Na), (B.3) 
where we have used the fact that Q = 7sign(yl*) + Pn'={Q)- Similarly, we have that 

{Qb,Nb) - {UV + PT±iQB),NB) 

= {Q~PtAQ) + Pt^{Qb),Nb) 

= {Pt^{Qb)~PMQ),Nb) + {Q,Nb), (B.4) 



where we have used the fact that Q = UV + Pt±{Q). Putting ( |B.3[ ) and ( |B.4[ | 
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together, we have that 

{Qa,Na) + {Qb,Nb) = {Pn4QA)-Pn^iQ),NA) 

+ {Pt^{Qb)-Pt^{Q),Nb) 

+ {Q,Na + Nb) 
= {Pn4QA)~Pn4Q),NA) 

+ {PMQb)-PMQ),Nb) 
= (Pn^-iQA) - Pn^iQ),Pn^iNA)) 

+ {PtAQb)-PtAQ),Pt^{Nb)). (B.5) 

In the second equahty, we used the fact that Na + Nb — 0. 

Since {Qa,Qb) is any subgradient, we have some freedom in selecting Pn^iQA) 
and Pt^{Qb) as long as they still satisfy the subgradient conditions ||Pjj<:(Qyi)||oo < 7 
and \\Pt^{Qb)\\ < 1. We set Pn^QA) = 7sign(Pn=(A^A)) so that \\Pn4QA)\\oo = 7 
and {Pn4QA),Pn^iNA)) = -/\\Pn^iNA)\\i. Letting Pt^{Nb) = UtV^ be the singu- 
lar value decomposition of P7'-l(A''b), we set Pt^{Qb) = UV^ so that ||Pt^(Qb)I1 = 1 



and {Pt^{Qb),Pt^{Nb)) = \\Pt^{^b)\\*- Consequently, we can simplify (B.5) as 
follows: 

{Qa,Na) + {Qb,Nb) > h-\\Pn^m\oo)i\\Pn^iNA)\\i) 
+{1-\\Pt.{Q)\\){\\PtANb)\U). 

Since ||Po<=(Q)||oo < 7 and \\Pt±{Q)\\ < 1, we have that {Qa,Na) + {Qb,Nb) is 
strictly positive unless Pn-{NA) = and Prp±{NB) = 0. (Note that if {Qa,Na) + 
(Qb.Nb) > Othen7P*-|-iVA||i + ||S* + iVB||* > -f\\A*\\i + \\B*\\.,.) However, we have 
that Na+Nb = 0. If Pn-(A^A) = Pt^{Nb) = 0, then Pn{NA)+PT{NB) = 0. In other 
words, Pn{NA) = -Pt(A^b). This can only be possible if Pn{NA) = Pt(^b) = (as 
nnr ^ {O}), which implies that Na ^ Nb = 0. Therefore, 7||A* + A^yi||i + IIP"^ + 
NbW, > 7PII1 + 11^11* unless Na = Nb = 0. □ 

Proof of Theorem [2j As with the previous proof, we avoid cluttered notation 
by letting n = n{A*), T = T(P*), n'^iA*) = and T^{B*) = T^. One can check 
that 

F(R^^n(A*\ < ^ ^ ^ 1-3C(P-)M^*) ,^ 
i{B )MA ) < g ^ 1 - 4C(P^)m(^*) ^ 'K^) • ^ 

Thus, we show that if ^(P*)/i(A*) < g then there exists a range of 7 for which a dual 
Q with the requisite properties exists. Also note that plugging in ^(i?*)/i(A*) = g in 
the above range gives the smaller range (3^(i?'^), iJ^a*) ) ^'^'^ ^' ^^"^ geometric mean of 

the extreme values gives 7 — \J~^^^} which is always within the above range. 

We aim to construct a dual Q by considering candidates in the direct sum ® T 
of the tangent spaces. Since /j,(A*)^(P*) < g, we can conclude from Proposition [l] 
that there exists a unique Q G ^l(BT such that Pn{Q) — 7sign(A*) and Pt{Q) = UV 
(recall that these are conditions that a dual must satisfy according to Proposition|2|, as 
finT ~ {0}. The rest of this proof shows that if ii{A*)£_{B*) < g then the projections 

of such a Q onto T-^ and onto il'^ will be small, i.e., we show that ||Po^(Q)||oo < 7 
and \\Prr^{Q)\\ < 1. 

We note here that Q can be uniquely expressed as the sum of an element of T 
and an element of il, i.e., Q = + Qt with G CI and Qt G T. The uniqueness 
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of the splitting can be concluded because QnT — {0}. Let Qn — 7sign(^*) + eo and 
Qj, = UV + et- We then have 

Pn{Q) = 7sign(A*) + en + Po(Ot) = lsign(A*) + en + Pn{UV' + er). 

Since PniQ) = 7sign(A*), 

en = -Pn{UV' + eT). (B.7) 

Similarly, 

eT^-PTilsigniA*) + en). (B.8) 

Next, we obtain the following bound on ||Pn^(Q)|loo: 

11^^0^(^)1100 = WPn^UV + eT)\\co 
< WUr + erW^ 
<aB*)\\Ur + eT\\ 

<e(i3*)(l + ||eT||), (B.9) 

where we obtain the second inequality based on the definition of £,{B*) (since UV + 
ct G T). Similarly, we can obtain the following bound on ||-Pt^(0)I1 

||Pt^(Q)|| = \\PT^{jsign{A*)+en)\\ 

< |l7sign(A^) + ejj|| 

< Ai(A*)||7sign(^*) + 

<M^*)(7+||eo||oo), (B.IO) 

where we obtain the second inequality based on the definition of fJ.{A*) (since 7sign(^*)- 
en G ri). Thus, we can bound ||-Po<=(Q)||oo and ||P7ii((3)|| by bounding ||eT|| and 
1 1 en I loo respectively (using the relatio ns (|B.8[ ) and (B.7l). 
By definition of ^{B*) and using (B.7 1, 

||eo||oo = |l/'n(f/^' + eT)||oo 

< ||C/l^' + eT||oo 

< ^{B*)\\UV' + erW 

<aB*){l + \\eT\\), (B.ll) 

where the second inequality is obtained because UV + eT G T. Similarly, by definition 
of ^i{A*) and using ( [5^ 1 

llerll - \\PT{lsign{A*) + en)\\ 
<2||7sign(^*) + en|| 
< 2/i(A*)||7sign(A*) + eo||oo 

<2/x(A*)(7+||eo||oo), (B.12) 

where the first inequality is obtained because ||Pt(-A^)|| < 2||Af||, and the second 
inequality is obtained because 7sign(j4*) + en G 
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Putting (B.ll I in (B.12I, we have that 



\\eT\\<2f,{A*){^ + aB*){l + \\eT\\)) 
2J^liA*)+2aB*)^i{A*) 



1 - 2aB*)KA*) 



(B.13) 



Similarly, putting (B.12I in (B.ll|, we have that 

||eji||oo 



<^(B*)(l + 2MA*)(7+||ea||oo)) 



< 



1 - 2aB*)fi{A*) 



(B.14) 



We now show that \\Pt±{Q)\\ < 1. Combining ( |B.14| and ( |B.10| , 

^{B*)+2j^{B*MA*y 



\PT^m\<KA*){i+ 



1 



1 - 2aB*)fi{A*) 

as*) 



l-3C(i3*)M(A') 



1 - 2aB*)KA*) 



= 1, 

since 7 < — ^ ^ assumption. 



Finally, we show that ||Pn<:(Q)|ioo < 7- Combining (B.13I and (B.9), 

2^fiiA*) + 2^iB*)fiiA*y 



\Pn4Q)\\oo < ^B*) { 1 



1 



1 - 2aB*)^l{A*) 
2lKA*) 



1 



7 



+ 7 



2aB-)KA* 
1 + 2j^i{A*) 
1 - 2C(i?*)M(A*) 
eii?-^) + 2jaB*)^i{A*) - 7 + 27C(i?*)/.(A*) 
1 - 2aB*)fiiA*) 

aB*)-^{i-4aB^)fi{A*)) 



+ 7 



< 



l-2aB*)fi{A*) 

aB*)-aB*) 1 
i-2e(B*)M^*)J 



+ 7 



7- 



Here, we used the fact that 



i-4^(g*)^i(yl*) < 7 in the second inequality. □ 

Proof of Proposition [3} Based on the Perron- Frobenius theorem [17j, one can 

conclude that ||P|| > \\Q\\ if Pij > \Qij\, V Thus, we need only consider the 
matrix that has 1 in every location in the support set fl{A) and everywhere else. 
Based on the definition of the spectral norm, we can re-write fJ-{A) as follows: 



KA) = 



max 

.t||2 = 1.||'!/||2 = 1 



E 



(B.15) 



{i,j)en{A) 



Without loss of generality we restrict our attention to optima that are achieved by 
element- wise non-negative vectors x,y. 
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Upper bound. Since the reformulation of fi{A) above involves the maximization 
of a continuous function over a compact set, the maximum is achieved at some point 
in the constraint set. Therefore, we have that any optimal must satisfy the 

following necessary optimality conditions: There exist Lagrange multipliers Ai,A2 
such that 



(jj)en(A) 



^ 2Xix 



This reduces to the following system of equations: 

(ij)en(A) 

(i,j)GO(A) 



(B.16) 
(B.17) 



Multiplying the first system of equations (B.16 1 element- wise by x and then summing, 
we have that 

^ Xiy J ^2\i. 

(»j)eo(A) 



Similarly, we have that 



ilj = 2A2, which implies that the Lagrange 



(i,j)ea(A) -^^ 

multipliers are equal to each other and to one-half of the optimal value attained 

2Ai = 2A2 Y ^^V] - 

We recall here that the optimal points x,y are element-wise non-negative. Let a 
denote the element-wise sum of the optimal points x, y: 



Vr 



Summing over all i in ( |B.16[ ) and all j in ( |B.17 1 , we have that 

Y yj + Y Y x,^2Xxa 

i j:(jj)en(A) 3 i:{i,j)(^Q.(A) 

Y^ Vj + Y^ ^' = 2A X (T 

(jj)Gn(A) (j,i)eO(A) 

=> X!^^Sniax(^)% + Xl^*^gmax(^)i» > 2A X (T 

3 i 

^de 

Smax(^) X CT > 2A X (T 

^ deg^^^(A) > 2A = ^ x,yj. 

(ij)eo(A) 

Note that we used the fact that a ^Q. Thus, we have that < deg^^^^{A). 
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Lower bound. Now suppose that each row/column of A has at least deg,jj;„(v4) 
non-zero entries. Using the reformulation (B.15I of fJ.{A) above, we have that 

KA)> E -^^= '^^^^"'^^^' >deg^in(^). 
— \/n \/n n 

{t,j)<£n{A) V V 

Here we set x = y = -^'i-, with 1 representing the all-ones vector, as candidates in 
the optimization problem ( |B.15| . □ 

Proof of Proposition ^ Let B = C/SV'^ be the SVD of B. 

Upper bound. We can upper-bound S,{B) as follows 

= max \\M\\oc 

MeT{B),\\M\\<l 

< max \\PTiB){M)\\oo 

M unitary 

< max \\PuM\\oc+ niax \\{Inxn ~ Pu)MPv\\oo- 

]\1 unitary M unitary 

For the second inequality, we have used the fact that the maximum of a convex 
function over a convex set is achieved at one of the extreme points of the constraint set. 
The unitary matrices are the extreme points of the set of contractions (i.e., matrices 
with spectral norm < 1). We have used Pt(b){M) = PuM + MPy - PuMPy from 
(4.1 ) in the last inequality, where Pu — UU and Py — VV^ denote the projections 
onto the spaces spanned by U and V respectively. 

We have the following simple bound for H/V-^^ljoo with M unitary: 

max ll^c/Mlloc- = max maxefPi/Mej 

M unitary Ad unitary ij 

< max max ||Pt/ei||2 ||Mej||2 

M unitary 

= max ||P[/ei||2 x max max||Afej||2 

i M unitary j 

= /?([/). (B.18) 

Here we used the Cauchy-Schwartz inequality in the second line, and the definition 
of P from (4.6) in the last line. 
Similarly, we have that 

max \\{Inxn - Pu)MPv\\oo ^ max ma^ef (Inxn - Pu)M PyCj 

M unitary Ai unitary i^j 

< max max ||(/„xn - -P;7)ei||2 ||AfPyej||2 

M unitary 

= max|l(/„xn - -P;7)ei||2 x max max || AfPyej||2 

i M unitary j 

< 1 X max ||P\/ej||2 

j 

= I3{V). (B.19) 



Using the definition of inc(i?) from ( |4.7| along with (B.18 1 and (B.19 1, we have 

that 

C(S) < P{U) + (3{V) < 2 inc(P). 
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Lower bound. Next we prove a lower bound on S,{B). Recall the definition of the 



tangent space T{B) from (3.2). We restrict our attention to elements of the tangent 
space T{B) of the form PjjM = UU'^M for M unitary (an analogous argument follows 
for elements of the form PyM for M unitary). One can check that 

\\PuM\\ = max x^PuMy < max HFr/.THz max ||Afy||2 < 1. 

Il--E||2 = l,||y||2 = l l|a:||2 = l l|y||2 = l 



Therefore, 



^{B) > max \\PuM\\. 

M unitary 



Thus, we only need to show that the inequality in line (2) of ( |B.18 ) is achieved by 



some unitary matrix M in order to conclude that £,{B) > (3{U). Define the "most 
aligned" basis vector with the subspace U as follows: 

i* = argmax ||P;7ei||2- 

i 

Let M be any unitary matrix with one of its columns equal to PuSi* , i.e., a 
normalized version of the projection onto U of the most aligned basis vector. One can 



check that such a unitary matrix achieves equality in line (2) of (B.18 1. Consequently, 
we have that 

^(i3)> max \\PuM\\^^f]{U). 

M unitary 

By a similar argument with respect to V , we have the lower bound as claimed in the 
proposition. □ 
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